<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />

    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    
    <title>6.1. 赛题理解 &#8212; FunRec 推荐系统 0.0.1 documentation</title>

    <link rel="stylesheet" href="../_static/material-design-lite-1.3.0/material.blue-deep_orange.min.css" type="text/css" />
    <link rel="stylesheet" href="../_static/sphinx_materialdesign_theme.css" type="text/css" />
    <link rel="stylesheet" href="../_static/fontawesome/all.css" type="text/css" />
    <link rel="stylesheet" href="../_static/fonts.css" type="text/css" />
    <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="../_static/basic.css" />
    <link rel="stylesheet" type="text/css" href="../_static/d2l.css" />
    <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
    <script src="../_static/jquery.js"></script>
    <script src="../_static/underscore.js"></script>
    <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
    <script src="../_static/doctools.js"></script>
    <script src="../_static/sphinx_highlight.js"></script>
    <script src="../_static/d2l.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="next" title="6.2. Baseline" href="2.baseline.html" />
    <link rel="prev" title="6. 项目实践" href="index.html" /> 
  </head>
<body>
    <div class="mdl-layout mdl-js-layout mdl-layout--fixed-header mdl-layout--fixed-drawer"><header class="mdl-layout__header mdl-layout__header--waterfall ">
    <div class="mdl-layout__header-row">
        
        <nav class="mdl-navigation breadcrumb">
            <a class="mdl-navigation__link" href="index.html"><span class="section-number">6. </span>项目实践</a><i class="material-icons">navigate_next</i>
            <a class="mdl-navigation__link is-active"><span class="section-number">6.1. </span>赛题理解</a>
        </nav>
        <div class="mdl-layout-spacer"></div>
        <nav class="mdl-navigation">
        
<form class="form-inline pull-sm-right" action="../search.html" method="get">
      <div class="mdl-textfield mdl-js-textfield mdl-textfield--expandable mdl-textfield--floating-label mdl-textfield--align-right">
        <label id="quick-search-icon" class="mdl-button mdl-js-button mdl-button--icon"  for="waterfall-exp">
          <i class="material-icons">search</i>
        </label>
        <div class="mdl-textfield__expandable-holder">
          <input class="mdl-textfield__input" type="text" name="q"  id="waterfall-exp" placeholder="Search" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </div>
      </div>
      <div class="mdl-tooltip" data-mdl-for="quick-search-icon">
      Quick search
      </div>
</form>
        
<a id="button-show-source"
    class="mdl-button mdl-js-button mdl-button--icon"
    href="../_sources/chapter_5_projects/1.understanding.rst.txt" rel="nofollow">
  <i class="material-icons">code</i>
</a>
<div class="mdl-tooltip" data-mdl-for="button-show-source">
Show Source
</div>
        </nav>
    </div>
    <div class="mdl-layout__header-row header-links">
      <div class="mdl-layout-spacer"></div>
      <nav class="mdl-navigation">
          
              <a  class="mdl-navigation__link" href="https://funrec-notebooks.s3.eu-west-3.amazonaws.com/fun-rec.zip">
                  <i class="fas fa-download"></i>
                  Jupyter 记事本
              </a>
          
              <a  class="mdl-navigation__link" href="https://github.com/datawhalechina/fun-rec">
                  <i class="fab fa-github"></i>
                  GitHub
              </a>
      </nav>
    </div>
</header><header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="../index.html">
              <span class="title-text">
                  FunRec 推荐系统
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            
            <nav class="mdl-navigation">
                <ul>
<li class="toctree-l1"><a class="reference internal" href="../chapter_preface/index.html">前言</a></li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_installation/index.html">安装</a></li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_notation/index.html">符号</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../chapter_0_introduction/index.html">1. 推荐系统概述</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_0_introduction/1.intro.html">1.1. 推荐系统是什么？</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_0_introduction/2.outline.html">1.2. 本书概览</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_1_retrieval/index.html">2. 召回模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_1_retrieval/1.cf/index.html">2.1. 协同过滤</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/1.cf/1.itemcf.html">2.1.1. 基于物品的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/1.cf/2.usercf.html">2.1.2. 基于用户的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/1.cf/3.mf.html">2.1.3. 矩阵分解</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/1.cf/4.summary.html">2.1.4. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_1_retrieval/2.embedding/index.html">2.2. 向量召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/2.embedding/1.i2i.html">2.2.1. I2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/2.embedding/2.u2i.html">2.2.2. U2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/2.embedding/3.summary.html">2.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_1_retrieval/3.sequence/index.html">2.3. 序列召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/3.sequence/1.user_interests.html">2.3.1. 深化用户兴趣表示</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/3.sequence/2.generateive_recall.html">2.3.2. 生成式召回方法</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/3.sequence/3.summary.html">2.3.3. 总结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_2_ranking/index.html">3. 精排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/1.wide_and_deep.html">3.1. 记忆与泛化</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/2.feature_crossing/index.html">3.2. 特征交叉</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/2.feature_crossing/1.second_order.html">3.2.1. 二阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/2.feature_crossing/2.higher_order.html">3.2.2. 高阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/2.feature_crossing/3.summary.html">3.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/3.sequence.html">3.3. 序列建模</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/index.html">3.4. 多目标建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/1.arch.html">3.4.1. 基础结构演进</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/2.dependency_modeling.html">3.4.2. 任务依赖建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/3.multi_loss_optim.html">3.4.3. 多目标损失融合</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/4.summary.html">3.4.4. 小结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/5.multi_scenario/index.html">3.5. 多场景建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/5.multi_scenario/1.multi_tower.html">3.5.1. 多塔结构</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/5.multi_scenario/2.dynamic_weight.html">3.5.2. 动态权重建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/5.multi_scenario/3.summary.html">3.5.3. 小结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_3_rerank/index.html">4. 重排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_3_rerank/1.greedy.html">4.1. 基于贪心的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_3_rerank/2.personalized.html">4.2. 基于个性化的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_3_rerank/3.summary.html">4.3. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_4_trends/index.html">5. 难点及热点研究</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_4_trends/1.debias.html">5.1. 模型去偏</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_4_trends/2.cold_start.html">5.2. 冷启动问题</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_4_trends/3.generative.html">5.3. 生成式推荐</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_4_trends/4.summary.html">5.4. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">6. 项目实践</a><ul class="current">
<li class="toctree-l2 current"><a class="current reference internal" href="#">6.1. 赛题理解</a></li>
<li class="toctree-l2"><a class="reference internal" href="2.baseline.html">6.2. Baseline</a></li>
<li class="toctree-l2"><a class="reference internal" href="3.analysis.html">6.3. 数据分析</a></li>
<li class="toctree-l2"><a class="reference internal" href="4.recall.html">6.4. 多路召回</a></li>
<li class="toctree-l2"><a class="reference internal" href="5.feature_engineering.html">6.5. 特征工程</a></li>
<li class="toctree-l2"><a class="reference internal" href="6.ranking.html">6.6. 排序模型</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_appendix/index.html">7. Appendix</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_appendix/word2vec.html">7.1. Word2vec</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../chapter_references/references.html">参考文献</a></li>
</ul>

            </nav>
        
        </div>
    
</header>
        <main class="mdl-layout__content" tabIndex="0">

	<script type="text/javascript" src="../_static/sphinx_materialdesign_theme.js "></script>
    <header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="../index.html">
              <span class="title-text">
                  FunRec 推荐系统
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            
            <nav class="mdl-navigation">
                <ul>
<li class="toctree-l1"><a class="reference internal" href="../chapter_preface/index.html">前言</a></li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_installation/index.html">安装</a></li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_notation/index.html">符号</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../chapter_0_introduction/index.html">1. 推荐系统概述</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_0_introduction/1.intro.html">1.1. 推荐系统是什么？</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_0_introduction/2.outline.html">1.2. 本书概览</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_1_retrieval/index.html">2. 召回模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_1_retrieval/1.cf/index.html">2.1. 协同过滤</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/1.cf/1.itemcf.html">2.1.1. 基于物品的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/1.cf/2.usercf.html">2.1.2. 基于用户的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/1.cf/3.mf.html">2.1.3. 矩阵分解</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/1.cf/4.summary.html">2.1.4. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_1_retrieval/2.embedding/index.html">2.2. 向量召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/2.embedding/1.i2i.html">2.2.1. I2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/2.embedding/2.u2i.html">2.2.2. U2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/2.embedding/3.summary.html">2.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_1_retrieval/3.sequence/index.html">2.3. 序列召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/3.sequence/1.user_interests.html">2.3.1. 深化用户兴趣表示</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/3.sequence/2.generateive_recall.html">2.3.2. 生成式召回方法</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_1_retrieval/3.sequence/3.summary.html">2.3.3. 总结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_2_ranking/index.html">3. 精排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/1.wide_and_deep.html">3.1. 记忆与泛化</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/2.feature_crossing/index.html">3.2. 特征交叉</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/2.feature_crossing/1.second_order.html">3.2.1. 二阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/2.feature_crossing/2.higher_order.html">3.2.2. 高阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/2.feature_crossing/3.summary.html">3.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/3.sequence.html">3.3. 序列建模</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/index.html">3.4. 多目标建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/1.arch.html">3.4.1. 基础结构演进</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/2.dependency_modeling.html">3.4.2. 任务依赖建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/3.multi_loss_optim.html">3.4.3. 多目标损失融合</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/4.multi_objective/4.summary.html">3.4.4. 小结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_2_ranking/5.multi_scenario/index.html">3.5. 多场景建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/5.multi_scenario/1.multi_tower.html">3.5.1. 多塔结构</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/5.multi_scenario/2.dynamic_weight.html">3.5.2. 动态权重建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../chapter_2_ranking/5.multi_scenario/3.summary.html">3.5.3. 小结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_3_rerank/index.html">4. 重排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_3_rerank/1.greedy.html">4.1. 基于贪心的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_3_rerank/2.personalized.html">4.2. 基于个性化的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_3_rerank/3.summary.html">4.3. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_4_trends/index.html">5. 难点及热点研究</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_4_trends/1.debias.html">5.1. 模型去偏</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_4_trends/2.cold_start.html">5.2. 冷启动问题</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_4_trends/3.generative.html">5.3. 生成式推荐</a></li>
<li class="toctree-l2"><a class="reference internal" href="../chapter_4_trends/4.summary.html">5.4. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">6. 项目实践</a><ul class="current">
<li class="toctree-l2 current"><a class="current reference internal" href="#">6.1. 赛题理解</a></li>
<li class="toctree-l2"><a class="reference internal" href="2.baseline.html">6.2. Baseline</a></li>
<li class="toctree-l2"><a class="reference internal" href="3.analysis.html">6.3. 数据分析</a></li>
<li class="toctree-l2"><a class="reference internal" href="4.recall.html">6.4. 多路召回</a></li>
<li class="toctree-l2"><a class="reference internal" href="5.feature_engineering.html">6.5. 特征工程</a></li>
<li class="toctree-l2"><a class="reference internal" href="6.ranking.html">6.6. 排序模型</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../chapter_appendix/index.html">7. Appendix</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../chapter_appendix/word2vec.html">7.1. Word2vec</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../chapter_references/references.html">参考文献</a></li>
</ul>

            </nav>
        
        </div>
    
</header>

    <div class="document">
        <div class="page-content" role="main">
        
  <section id="id1">
<h1><span class="section-number">6.1. </span>赛题理解<a class="headerlink" href="#id1" title="Permalink to this heading">¶</a></h1>
<p>赛题理解是切入一道赛题的基础，会影响后续特征工程和模型构建等各种工作，也影响着后续发展工作的方向，正确了解赛题背后的思想以及赛题业务逻辑的清晰，有利于花费更少时间构建更为有效的特征模型，
在各种比赛中， 赛题理解都是极其重要且必须走好的第一步，
今天我们就从赛题的理解出发，
首先了解一下这次赛题的概况和数据，从中分析赛题以及大致的处理方式，
其次我们了解模型评测的指标，最后对赛题的理解整理一些经验。</p>
<p>此次比赛是<a class="reference external" href="https://tianchi.aliyun.com/competition/entrance/531842?spm=5176.12281973.J_6-HJZaSjQocH7SIdvbK02.1.724d3b74lGCqEO">新闻推荐场景下的用户行为预测挑战赛</a>，
该赛题是以新闻APP中的新闻推荐为背景，
目的是<strong>要求我们根据用户历史浏览点击新闻文章的数据信息预测用户未来的点击行为，
即用户的最后一次点击的新闻文章</strong>，
这道赛题的设计初衷是引导大家了解推荐系统中的一些业务背景，
解决实际问题。</p>
<section id="id2">
<h2><span class="section-number">6.1.1. </span>数据概况<a class="headerlink" href="#id2" title="Permalink to this heading">¶</a></h2>
<p>该数据来自某新闻APP平台的用户交互数据，包括30万用户，近300万次点击，共36万多篇不同的新闻文章，同时每篇新闻文章有对应的embedding向量表示。为了保证比赛的公平性，从中抽取20万用户的点击日志数据作为训练集，5万用户的点击日志数据作为测试集A，5万用户的点击日志数据作为测试集B。具体数据表和参数，
大家可以参考赛题说明。下面说一下拿到这样的数据如何进行理解，
来有效的开展下一步的工作。</p>
</section>
<section id="id3">
<h2><span class="section-number">6.1.2. </span>评价方式理解<a class="headerlink" href="#id3" title="Permalink to this heading">¶</a></h2>
<p>理解评价方式， 我们需要结合着最后的提交文件来看，
根据sample.submit.csv， 我们最后提交的格式是针对每个用户，
我们都会给出五篇文章的推荐结果，按照点击概率从前往后排序。
而真实的每个用户最后一次点击的文章只会有一篇的真实答案，
所以我们就看我们推荐的这五篇里面是否有命中真实答案的。比如对于user1来说，
我们的提交会是：user1, article1, article2, article3, article4,
article5$。</p>
<p>评价指标的公式如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-5-projects-1-understanding-0">
<span class="eqno">(6.1.1)<a class="headerlink" href="#equation-chapter-5-projects-1-understanding-0" title="Permalink to this equation">¶</a></span>\[score(\text{user}) = \sum_{k=1}^5 \frac{s(\text{user}, k)}{k}\]</div>
<p>假如article1就是真实的用户点击文章，也就是article1命中，
则<span class="math notranslate nohighlight">\(s(\text{user1},1)=1, s(\text{user1},2-4)\)</span>都是0，
如果article2是用户点击的文章，
则<span class="math notranslate nohighlight">\(s(\text{user},2)=1/2,s(\text{user},1,3,4,5)\)</span>都是0。也就是<span class="math notranslate nohighlight">\(score(\text{user})\)</span>命中第几条的倒数。如果都没中，
则<span class="math notranslate nohighlight">\(score(\text{user1})=0\)</span>。 这个是合理的，
因为我们希望的就是命中的结果尽量靠前， 而此时分数正好比较高。</p>
</section>
<section id="id4">
<h2><span class="section-number">6.1.3. </span>问题分析<a class="headerlink" href="#id4" title="Permalink to this heading">¶</a></h2>
<p>根据赛题简介，我们首先要明确我们此次比赛的目标：
根据用户历史浏览点击新闻的数据信息预测用户最后一次点击的新闻文章。从这个目标上看，
会发现此次比赛和我们之前遇到的普通的结构化比赛不太一样， 主要有两点： -
首先是目标上，
要预测最后一次点击的新闻文章，也就是我们给用户推荐的是新闻文章，
并不是像之前那种预测一个数或者预测数据哪一类那样的问题 - 数据上，
通过给出的数据我们会发现，
这种数据也不是我们之前遇到的那种特征+标签的数据，而是基于了真实的业务场景，
拿到的用户的点击日志</p>
<p>所以拿到这个题目，我们的思考方向就是结合我们的目标，<strong>把该预测问题转成一个监督学习的问题(特征+标签)，然后我们才能进行ML，DL等建模预测</strong>。那么我们自然而然的就应该在心里会有这么几个问题：如何转成一个监督学习问题呢？
转成一个什么样的监督学习问题呢？ 我们能利用的特征又有哪些呢？
又有哪些模型可以尝试呢？
此次面对数万级别的文章推荐，我们又有哪些策略呢？</p>
<p>当然这些问题不会在我们刚看到赛题之后就一下出来答案，
但是只要有了问题之后， 我们就能想办法解决问题了，
比如上面的第二个问题，转成一个什么样的监督学习问题？
由于我们是预测用户最后一次点击的新闻文章，从36万篇文章中预测某一篇的话我们首先可能会想到这可能是一个多分类的问题(36万类里面选1)，
但是如此庞大的分类问题， 我们做起来可能比较困难， 那么能不能转化一下？
既然是要预测最后一次点击的文章，
那么如果我们能预测出某个用户最后一次对于某一篇文章会进行点击的概率，
是不是就间接性的解决了这个问题呢？概率最大的那篇文章不就是用户最后一次可能点击的新闻文章吗？
这样就把原问题变成了一个点击率预测的问题(用户, 文章) –&gt;
点击的概率(软分类)， 而这个问题，
就是我们所熟悉的监督学习领域分类问题了， 这样我们后面建模的时候，
对于模型的选择就基本上有大致方向了，比如最简单的逻辑回归模型。</p>
<p>这样，
我们对于该赛题的解决方案应该有了一个大致的解决思路，要先转成一个分类问题来做，
而分类的标签就是用户是否会点击某篇文章，分类问题的特征中会有用户和文章，我们要训练一个分类模型，
对某用户最后一次点击某篇文章的概率进行预测。
那么又会有几个问题：如何转成监督学习问题？ 训练集和测试集怎么制作？
我们又能利用哪些特征？ 我们又可以尝试哪些模型？ 面对36万篇文章，
20多万用户的推荐，
我们又有哪些策略来缩减问题的规模？如何进行最后的预测？</p>
</section>
</section>


        </div>
        <div class="side-doc-outline">
            <div class="side-doc-outline--content"> 
<div class="localtoc">
    <p class="caption">
      <span class="caption-text">Table Of Contents</span>
    </p>
    <ul>
<li><a class="reference internal" href="#">6.1. 赛题理解</a><ul>
<li><a class="reference internal" href="#id2">6.1.1. 数据概况</a></li>
<li><a class="reference internal" href="#id3">6.1.2. 评价方式理解</a></li>
<li><a class="reference internal" href="#id4">6.1.3. 问题分析</a></li>
</ul>
</li>
</ul>

</div>
            </div>
        </div>

      <div class="clearer"></div>
    </div><div class="pagenation">
     <a id="button-prev" href="index.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="P">
         <i class="pagenation-arrow-L fas fa-arrow-left fa-lg"></i>
         <div class="pagenation-text">
            <span class="pagenation-direction">Previous</span>
            <div>6. 项目实践</div>
         </div>
     </a>
     <a id="button-next" href="2.baseline.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="N">
         <i class="pagenation-arrow-R fas fa-arrow-right fa-lg"></i>
        <div class="pagenation-text">
            <span class="pagenation-direction">Next</span>
            <div>6.2. Baseline</div>
        </div>
     </a>
  </div>
        
        </main>
    </div>
  </body>
</html>